6 research outputs found
A Mocktail of Source Code Representations
Efficient representation of source code is essential for various software
engineering tasks such as code search and code clone detection. One such
technique for representing source code involves extracting paths from the AST
and using a learning model to capture program properties. Code2vec is a
commonly used path-based approach that uses an attention-based neural network
to learn code embeddings which can then be used for various software
engineering tasks. However, this approach uses only ASTs and does not leverage
other graph structures such as Control Flow Graphs (CFG) and Program Dependency
Graphs (PDG). Similarly, most recent approaches for representing source code
still use AST and do not leverage semantic graph structures. Even though there
exists an integrated graph approach (Code Property Graph) for representing
source code, it has only been explored in the domain of software security.
Moreover, it does not leverage the paths from the individual graphs. In our
work, we extend the path-based approach code2vec to include semantic graphs,
CFG, and PDG, along with AST, which is still largely unexplored in the domain
of software engineering. We evaluate our approach on the task of MethodNaming
using a custom C dataset of 730K methods collected from 16 C projects from
GitHub. In comparison to code2vec, our approach improves the F1 Score by 11% on
the full dataset and up to 100% with individual projects. We show that semantic
features from the CFG and PDG paths are indeed helpful. We envision that
looking at a mocktail of source code representations for various software
engineering tasks can lay the foundation for a new line of research and a
re-haul of existing research
Diversity in Software Engineering Conferences and Journals
Diversity with respect to ethnicity and gender has been studied in
open-source and industrial settings for software development. Publication
avenues such as academic conferences and journals contribute to the growing
technology industry. However, there have been very few diversity-related
studies conducted in the context of academia. In this paper, we study the
ethnic, gender, and geographical diversity of the authors published in Software
Engineering conferences and journals. We provide a systematic quantitative
analysis of the diversity of publications and organizing and program committees
of three top conferences and two top journals in Software Engineering, which
indicates the existence of bias and entry barriers towards authors and
committee members belonging to certain ethnicities, gender, and/or geographical
locations in Software Engineering conferences and journal publications. For our
study, we analyse publication (accepted authors) and committee data (Program
and Organizing committee/ Journal Editorial Board) from the conferences ICSE,
FSE, and ASE and the journals IEEE TSE and ACM TOSEM from 2010 to 2022. The
analysis of the data shows that across participants and committee members,
there are some communities that are consistently significantly lower in
representation, for example, publications from countries in Africa, South
America, and Oceania. However, a correlation study between the diversity of the
committees and the participants did not yield any conclusive evidence.
Furthermore, there is no conclusive evidence that papers with White authors or
male authors were more likely to be cited. Finally, we see an improvement in
the ethnic diversity of the authors over the years 2010-2022 but not in gender
or geographical diversity.Comment: 13 pages, 10 figures, 4 table
Android Access Control Recommendation as a Deep Learning Task
Android enforces access control checks to protect sensitive framework APIs. If not properly protected, framework APIs can open the door for malicious apps to access sensitive resources without having the necessary privileges. Unfortunately, as reported in the existing literature, such access control anomalies are prevalent in Android APIs, notably those introduced by customization parties. Therefore, various solutions have been proposed to detect anomalies, particularly those due to inconsistencies in the enforcement of access checks across the Android framework(s). The solutions can be largely divided into two categories: convergence-based techniques which rely on the convergence of two APIs on similar resources, and probabilistic approaches which incorporate additional hints in the form of manually defined structural and semantic code constructs. In this paper, we are motivated by the promising application of using code constructs, beyond convergence as proposed by the probabilistic approaches, to recommend access control enforcement and detect inconsistencies.
Specifically, we propose a deep learning-based approach that aims to automatically learn the correspondence between various code constructs and access control requirements. To this end, we fine-tune CodeBert on statically derived features from the Android Open Source Project (AOSP). Our feature engineering process addresses various peculiarities that characterize Android implementations. The resulting fine-tuned model can be queried to recommend access control for vendor-customized APIs.
The fine-tuned model achieves an accuracy of 93%, a precision of 91%, and a recall of 92% in the AOSP data. Additionally, our evaluation of custom ROMs shows that the model is able to not only rediscover previously reported inconsistencies but also discover new ones